introduce a new integrated "codeflash optimize" command #384

misrasaurabh1 · 2025-06-26T03:50:18Z

PR Type

Enhancement, Tests

Description

Introduce integrated optimize CLI command
Add FunctionRanker class for ttX-based ranking
Extend tracer: generate replay test and invoke optimizer
Update tests: unit and end-to-end optimize flows

Changes diagram

flowchart LR
  CLI["codeflash optimize command"] --> TR["Tracer.trace code"]
  TR --> RT["Generate replay test"]
  RT --> FR["FunctionRanker.rerank_and_filter"]
  FR --> OPT["Optimizer.run_with_args"]
  OPT --> OUT["Optimization results"]

Changes walkthrough 📝

Relevant files

Enhancement

7 files

workload.py `Add sleep and heavy compute in SimpleModel.predict`	+10/-1
function_ranker.py New `FunctionRanker` for function profiling ranking	+144/-0
cli.py Add `optimize` subcommand and CI flag parsing	+10/-1
env_utils.py Add `is_ci()` helper for CI environment detection	+6/-0
functions_to_optimize.py `Integrate trace path and rerank functions workflow`	+53/-3
tracer.py `Support replay tests, static methods, optimization chaining`	+51/-8
profile_stats.py Include `class_name` and normalize caller keys	+14/-2

Configuration changes

2 files

config_consts.py Define `DEFAULT_IMPORTANCE_THRESHOLD` constant	+1/-0
pyproject.toml `Configure pytest warning filters`	+6/-0

Formatting

3 files

server.py `Fix import order and comma formatting in LSP server`	+7/-6
server_entry.py Change `setup_logging` return type signature	+4/-3
posthog_cf.py `Clean up trailing comments and commas`	+2/-2

Miscellaneous

1 files

pickle_patcher.py `Remove debug print in placeholder creation`	+0/-2

Tests

3 files

end_to_end_test_tracer_replay.py `Update traced count and coverage expectations`	+2/-2
end_to_end_test_utilities.py Switch to `codeflash.main optimize` invocation	+10/-24
test_function_ranker.py Add unit tests for `FunctionRanker` methods	+172/-0

Additional files

1 files

codeflash.trace	[link]

Need help?
Type /help how to ... in the comments thread for any questions about PR-Agent usage.
Check out the documentation for more information.

github-actions · 2025-06-26T03:51:30Z

PR Reviewer Guide 🔍

(Review updated until commit `9addd95`)

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 5 🔵🔵🔵🔵🔵
🧪 PR contains tests
🔒 Security concerns Sensitive information exposure: A live PostHog API key is committed in codeflash/telemetry/posthog_cf.py, which could be abused if extracted. Replace with secure retrieval (e.g., environment variable) and remove the embedded key.
⚡ Recommended focus areas for review Sensitive Info The PostHog project API key is hardcoded in source, risking exposure of credentials. _posthog = Posthog(project_api_key="phc_aUO790jHd7z1SXwsYCz8dRApxueplZlZWeDSpKc5hol", host="https://us.posthog.com") CLI Parsing Using parse_known_args with sys.argv reassignment may drop or reorder flags, leading to unexpected behavior for subcommands. args, unknown_args = parser.parse_known_args() sys.argv[:] = [sys.argv[0], unknown_args] Performance Bottleneck* The nested loops and sleep in SimpleModel.predict will significantly slow benchmarking and may skew optimization results. sleep(0.1) # can be optimized away for i in range(500): for x in data: computation = 0 computation += x * i ** 2 result.append(computation) return result

github-actions · 2025-06-26T03:52:38Z

PR Code Suggestions ✨

Latest suggestions up to 9addd95
Explore these optional code suggestions:

Category	Suggestion	Impact
Security	Load API key from environment Avoid committing sensitive API keys to source. Load the PostHog project_api_key from an environment variable at runtime and error out if it's missing. codeflash/telemetry/posthog_cf.py [24] -_posthog = Posthog(project_api_key="phc_aUO790jHd7z1SXwsYCz8dRApxueplZlZWeDSpKc5hol", host="https://us.posthog.com") +import os +project_api_key = os.getenv("POSTHOG_API_KEY") +if not project_api_key: + logger.error("POSTHOG_API_KEY environment variable is not set") + return +_posthog = Posthog(project_api_key=project_api_key, host="https://us.posthog.com") + Suggestion importance[1-10]: 9 __ Why: Hardcoding the PostHog key is a security risk; loading from `POSTHOG_API_KEY` ensures credentials aren’t committed and fails fast if not set.	High
Possible issue	Support both schema versions The code now unpacks an extra `class_name` column which may not exist in the pstats schema, leading to unpack errors. Handle both schema versions by inspecting row length and defaulting `class_name` when missing. codeflash/tracing/profile_stats.py [26-36] -for ( - filename, - line_number, - function, - class_name, - call_count_nonrecursive, - num_callers, - total_time_ns, - cumulative_time_ns, - callers, -) in pdata: +for row in pdata: + # support both old and new schemas + if len(row) == 8: + filename, line_number, function, call_count_nonrecursive, num_callers, total_time_ns, cumulative_time_ns, callers = row + class_name = None + else: + filename, line_number, function, class_name, call_count_nonrecursive, num_callers, total_time_ns, cumulative_time_ns, callers = row + loaded_callers = json.loads(callers) + ... Suggestion importance[1-10]: 6 __ Why: The new `class_name` unpack assumes an updated schema and may break older traces; checking row length and defaulting `class_name` maintains backward compatibility.	Low

Previous suggestions

Suggestions up to commit 4debe7e

Category	Suggestion	Impact
Possible issue	Include optimize subcommand in invocation Include the "optimize" subcommand in the `sys.argv` array so the top-level CLI parser recognizes it and invokes the optimization phase correctly. codeflash/tracer.py [873] -sys.argv = ["codeflash", "--replay-test", str(replay_test_path)] +sys.argv = ["codeflash", "optimize", "--replay-test", str(replay_test_path)] Suggestion importance[1-10]: 8 __ Why: The CLI invocation omits the "optimize" subcommand causing the top-level parser to misinterpret "--replay-test", so including it is critical for correct workflow.	Medium
General	Limit sys.argv mutation to optimize Restrict the mutation of `sys.argv` to only when the `"optimize"` command is active, so other subcommands receive their intended arguments and avoid unintended stripping. codeflash/cli_cmds/cli.py [73-74] args, unknown_args = parser.parse_known_args() -sys.argv[:] = [sys.argv[0], unknown_args] +if args.command == "optimize": + sys.argv[:] = [sys.argv[0], unknown_args] Suggestion importance[1-10]: 6 __ Why: Restricting the global `sys.argv` rewrite to the optimize command prevents other subcommands from losing their arguments and improves correctness.	Low
General	Hoist tracer_main import Move the import of `tracer_main` to the module level to avoid repeated imports each time `parse_args` is called and clarify dependencies. codeflash/cli_cmds/cli.py [26-27] -trace_optimize = subparsers.add_parser("optimize", help="Trace and optimize a Python project.") from codeflash.tracer import main as tracer_main +trace_optimize = subparsers.add_parser("optimize", help="Trace and optimize a Python project.") + Suggestion importance[1-10]: 3 __ Why: Moving the import to module scope reduces repeated imports in `parse_args`, but the performance gain is minimal and context clarity is modest.	Low

…(`trace-and-optimize`) Here is an optimized rewrite of your `FunctionRanker` class. **Key speed optimizations applied:** 1. **Avoid repeated loading of function stats:** The original code reloads function stats for each function during ranking (`get_function_ttx_score()` is called per function and loads/returns). We prefetch stats once in `rank_functions()` and reuse them for all lookups. 2. **Inline and batch lookups:** We use a helper to batch compute scores directly via a pre-fetched `stats` dict. This removes per-call overhead from attribute access and creation of possible keys inside the hot loop. 3. **Minimal string operations:** We precompute the two possible key formats needed for lookup (file:qualified and file:function) for all items only ONCE, instead of per invocation. 4. **Skip list-comprehension in favor of tuple-unpacking:** Use generator expressions for lower overhead when building output. 5. **Fast path with `dict.get()` lookup:** Avoid redundant `if key in dict` by just trying `dict.get(key)`. 6. **Do not change signatures or behavior. Do not rename any classes or functions. All logging, ordering, functionality is preserved.** **Summary of performance impact:** - The stats are loaded only once, not per function. - String concatenations for keys are only performed twice per function (and not redundantly in both `rank_functions` and `get_function_ttx_score`). - All lookup and sorting logic remains as in the original so results will match, but runtime (especially for large lists) will be significantly better. - If you want, you could further optimize by memoizing scores with LRU cache, but with this design, dictionary operations are already the bottleneck, and this is the lowest-overhead idiomatic Python approach. - No imports, function names, or signatures are changed. Let me know if you need further GPU-based or numpy/pandas-style speedups!

codeflash-ai · 2025-06-30T19:14:18Z

⚡️ Codeflash found optimizations for this PR

📄 13% (0.13x) speedup for `FunctionRanker.rank_functions` in `codeflash/benchmarking/function_ranker.py`

⏱️ Runtime : 1.84 milliseconds → 1.62 milliseconds (best of 67 runs)

I created a new dependent PR with the suggested changes. Please review:

⚡️ Speed up method FunctionRanker.rank_functions by 13% in PR #384 (trace-and-optimize) #458

If you approve, it will be merged into this PR (branch trace-and-optimize).

…384 (`trace-and-optimize`) Here is an **optimized** version of your code, focusing on the `_get_function_stats` function—the proven performance bottleneck per your line profiing. ### Optimizations Applied 1. **Avoid Building Unneeded Lists**: - Creating `possible_keys` as a list incurs per-call overhead. - Instead, directly check both keys in sequence, avoiding the list entirely. 2. **Short-circuit Early Return**: - Check for the first key (`qualified_name`) and return immediately if found (no need to compute or check the second unless necessary). 3. **String Formatting Optimization**: - Use f-strings directly in the condition rather than storing/interpolating beforehand. 4. **Comment Retention**: - All existing and relevant comments are preserved, though your original snippet has no in-method comments. --- --- ### Rationale - **No lists** or unneeded temporary objects are constructed. - Uses `.get`, which is faster than `in` + lookup. - Returns immediately upon match. --- **This change will reduce total runtime and memory usage significantly in codebases with many calls to `_get_function_stats`.** Function signatures and return values are unchanged.

codeflash-ai · 2025-07-01T22:08:52Z

⚡️ Codeflash found optimizations for this PR

📄 51% (0.51x) speedup for `FunctionRanker._get_function_stats` in `codeflash/benchmarking/function_ranker.py`

⏱️ Runtime : 497 microseconds → 330 microseconds (best of 51 runs)

I created a new dependent PR with the suggested changes. Please review:

⚡️ Speed up method FunctionRanker._get_function_stats by 51% in PR #384 (trace-and-optimize) #466

If you approve, it will be merged into this PR (branch trace-and-optimize).

…25-07-01T22.08.43 ⚡️ Speed up method `FunctionRanker._get_function_stats` by 51% in PR #384 (`trace-and-optimize`)

codeflash-ai · 2025-07-01T22:10:31Z

This PR is now faster! 🚀 @misrasaurabh1 accepted my optimizations from:

⚡️ Speed up method FunctionRanker._get_function_stats by 51% in PR #384 (trace-and-optimize) #466

github-actions · 2025-07-02T02:52:57Z

Persistent review updated to latest commit 9addd95

…imize`) Here's an optimized version of your Python program, focused on runtime and memory. **Key changes:** - Avoids reading the event file or parsing JSON if not needed. - Reads the file as binary and parses with `json.loads()` for slightly faster IO. - References the `"draft"` property directly using `.get()` to avoid possible `KeyError`. - Reduces scope of data loaded from JSON for less memory usage. - Caches the result of parsing the event file for repeated calls within the same process. - The inner try/except is kept close to only catching the specific case. - Results for each event_path file are cached in memory. - Exception handling and comments are preserved where their context is changed. - I/O and JSON parsing is only done if both env vars are set and PR number exists.

codeflash-ai · 2025-07-03T05:59:00Z

⚡️ Codeflash found optimizations for this PR

📄 121% (1.21x) speedup for `is_pr_draft` in `codeflash/code_utils/env_utils.py`

⏱️ Runtime : 4.98 milliseconds → 2.25 milliseconds (best of 94 runs)

I created a new dependent PR with the suggested changes. Please review:

⚡️ Speed up function is_pr_draft by 121% in PR #384 (trace-and-optimize) #499

If you approve, it will be merged into this PR (branch trace-and-optimize).

… trace-and-optimize

KRRT7 · 2025-07-03T19:12:55Z

@misrasaurabh1 ready to review, can't tag you normally b/c you're the author

code_to_optimize/code_directories/simple_tracer_e2e/codeflash.trace

aseembits93 · 2025-07-03T22:55:26Z

pyproject.toml

+[tool.pytest.ini_options]
+filterwarnings = [
+    "ignore::pytest.PytestCollectionWarning",
+    "ignore::pytest.PytestUnknownMarkWarning"


reasoning for PytestUnknownMarkWarning @KRRT7 ?

in codeflash/models/models.py we have a few Classes that are prefixed with Test, like

@dataclass(frozen=True) class TestsInFile: test_file: Path test_class: Optional[str] test_function: str test_type: TestType

which pytest complains about :

it is very noisy so I've disabled them

any examples of PytestUnknownMarkWarning specifically you've seen so far. I've not seen them yet.

it gets triggered on the pytests that have the skip_ci markers

codeflash/cli_cmds/cli.py

This reverts commit 39e0859.

codeflash/cli_cmds/cli.py

introduce a new integrated "codeflash optimize" command

4debe7e

misrasaurabh1 marked this pull request as draft June 26, 2025 03:50

github-actions bot added the Review effort 4/5 label Jun 26, 2025

KRRT7 and others added 3 commits June 26, 2025 18:33

Merge branch 'main' into trace-and-optimize

535a9b1

Merge branch 'main' into trace-and-optimize

09bf156

rank functions

0b4fcb6

KRRT7 force-pushed the trace-and-optimize branch from 08464c4 to 0b4fcb6 Compare June 30, 2025 19:04

Merge branch 'main' into trace-and-optimize

059b4dc

codeflash-ai bot mentioned this pull request Jun 30, 2025

⚡️ Speed up method FunctionRanker.rank_functions by 13% in PR #384 (trace-and-optimize) #458

Closed

KRRT7 and others added 5 commits June 30, 2025 16:06

implement reranker

7f9a609

allow predict to be included

eb9e0c6

fix tracer for static methods

ce68cad

Merge branch 'main' into trace-and-optimize

b7258a9

codeflash-ai bot mentioned this pull request Jul 1, 2025

⚡️ Speed up method FunctionRanker._get_function_stats by 51% in PR #384 (trace-and-optimize) #466

Merged

Merge pull request #466 from codeflash-ai/codeflash/optimize-pr384-20…

67bd717

…25-07-01T22.08.43 ⚡️ Speed up method `FunctionRanker._get_function_stats` by 51% in PR #384 (`trace-and-optimize`)

KRRT7 and others added 8 commits July 1, 2025 16:53

update tests

ea16342

don't let the AI replicate

947ab07

Merge branch 'main' into trace-and-optimize

4823ee5

ruff

faebe9b

mypy-ruff

a0e57ba

silence test collection warnings

fd1e492

Update function_ranker.py

f7c8a6b

Update workload.py

35059a9

github-actions bot added Review effort 5/5 and removed Review effort 4/5 labels Jul 2, 2025

KRRT7 added 4 commits July 2, 2025 18:44

rank only, change formula

70cecaf

per module ranking

96acfc7

update tests

e5e1ff0

move to env utils, pre-commit

eba8cb8

codeflash-ai bot mentioned this pull request Jul 3, 2025

⚡️ Speed up function is_pr_draft by 121% in PR #384 (trace-and-optimize) #499

Closed

Merge branch 'main' of https://github.com/codeflash-ai/codeflash into…

9955081

… trace-and-optimize

misrasaurabh1 commented Jul 3, 2025

View reviewed changes

code_to_optimize/code_directories/simple_tracer_e2e/codeflash.trace Outdated Show resolved Hide resolved

Merge branch 'main' into trace-and-optimize

692f46e

misrasaurabh1 requested a review from aseembits93 July 3, 2025 22:48

misrasaurabh1 enabled auto-merge July 3, 2025 22:51

aseembits93 reviewed Jul 3, 2025

View reviewed changes

codeflash/cli_cmds/cli.py Show resolved Hide resolved

aseembits93 reviewed Jul 3, 2025

View reviewed changes

codeflash/cli_cmds/cli.py Show resolved Hide resolved

KRRT7 and others added 5 commits July 3, 2025 16:20

add markers

e2e6803

Merge branch 'main' into trace-and-optimize

4560b8b

Update cli.py

39e0859

Revert "Update cli.py"

c09f32e

This reverts commit 39e0859.

allow args for the optimize command too

60922b8

misrasaurabh1 commented Jul 4, 2025

View reviewed changes

codeflash/cli_cmds/cli.py Show resolved Hide resolved

misrasaurabh1 commented Jul 4, 2025

View reviewed changes

codeflash/cli_cmds/cli.py Show resolved Hide resolved

KRRT7 added 2 commits July 3, 2025 17:20

fix parsing

bf6313f

fix parsing

87f44a2

misrasaurabh1 disabled auto-merge July 4, 2025 03:01

misrasaurabh1 merged commit 75810a3 into main Jul 4, 2025
16 of 17 checks passed

introduce a new integrated "codeflash optimize" command #384

introduce a new integrated "codeflash optimize" command #384

Uh oh!

Conversation

misrasaurabh1 commented Jun 26, 2025 • edited by aseembits93 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Type

Description

Changes diagram

Changes walkthrough 📝

Uh oh!

github-actions bot commented Jun 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Reviewer Guide 🔍

(Review updated until commit 9addd95)

Uh oh!

github-actions bot commented Jun 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Previous suggestions

Uh oh!

codeflash-ai bot commented Jun 30, 2025

⚡️ Codeflash found optimizations for this PR

📄 13% (0.13x) speedup for FunctionRanker.rank_functions in codeflash/benchmarking/function_ranker.py

I created a new dependent PR with the suggested changes. Please review:

⚡️ Speed up method FunctionRanker.rank_functions by 13% in PR #384 (trace-and-optimize) #458

Uh oh!

codeflash-ai bot commented Jul 1, 2025

⚡️ Codeflash found optimizations for this PR

📄 51% (0.51x) speedup for FunctionRanker._get_function_stats in codeflash/benchmarking/function_ranker.py

I created a new dependent PR with the suggested changes. Please review:

⚡️ Speed up method FunctionRanker._get_function_stats by 51% in PR #384 (trace-and-optimize) #466

Uh oh!

codeflash-ai bot commented Jul 1, 2025

Uh oh!

github-actions bot commented Jul 2, 2025

Uh oh!

codeflash-ai bot commented Jul 3, 2025

⚡️ Codeflash found optimizations for this PR

📄 121% (1.21x) speedup for is_pr_draft in codeflash/code_utils/env_utils.py

I created a new dependent PR with the suggested changes. Please review:

⚡️ Speed up function is_pr_draft by 121% in PR #384 (trace-and-optimize) #499

Uh oh!

KRRT7 commented Jul 3, 2025

Uh oh!

Uh oh!

aseembits93 Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

KRRT7 Jul 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aseembits93 Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

KRRT7 Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

misrasaurabh1 commented Jun 26, 2025 •

edited by aseembits93

Loading

github-actions bot commented Jun 26, 2025 •

edited

Loading

(Review updated until commit `9addd95`)

github-actions bot commented Jun 26, 2025 •

edited

Loading

📄 13% (0.13x) speedup for `FunctionRanker.rank_functions` in `codeflash/benchmarking/function_ranker.py`

⚡️ Speed up method `FunctionRanker.rank_functions` by 13% in PR #384 (`trace-and-optimize`) #458

📄 51% (0.51x) speedup for `FunctionRanker._get_function_stats` in `codeflash/benchmarking/function_ranker.py`

⚡️ Speed up method `FunctionRanker._get_function_stats` by 51% in PR #384 (`trace-and-optimize`) #466

📄 121% (1.21x) speedup for `is_pr_draft` in `codeflash/code_utils/env_utils.py`

⚡️ Speed up function `is_pr_draft` by 121% in PR #384 (`trace-and-optimize`) #499

KRRT7 Jul 3, 2025 •

edited

Loading